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UPDATING THE GANZFELD DATABASE: 
A VICTIM OF ITS OWN SUCCESS? 

By Daryl J. Bem, John Palmer, and Richard S. Broughton 


ABSTRACT: The e xis tence of psi — anomalous processes of information transfer such as 
telepathy or clairvoyance — continues to be controversial. Earlier meta-analyses of studies 
using the ganzfeld procedure appeared to provide replicable evidence for psi (D. J. Bem 
& C. Honorton, 1994), but a follow-up meta-analysis of 30 more recent ganzfeld studies 
did not (J. Milton & R. Wiseman, 1999). When 10 new studies published after the 
Milton-Wiseman cutoff date are added to their database, the overall ganzfeld effect again 
becomes significant, but the mean effect size is still smaller than those from the original 
studies. Ratings of all 40 studies by 3 independent raters reveal that the effect size achieved 
by a replication is significantly correlated with the degree to which it adhered to the 
standard ganzfeld protocol. Standard replications yield significant effect sizes comparable 
with those obtained in the past. 


The term psi denotes anomalous processes of information transfer 
such as telepathy and other forms of extrasensory perception that are 
currendy unexplained in terms of known physical or biological mecha- 
nisms. The question of whether psi actually exists continues to be contro- 
versial. In 1994, Bem and Honorton summarized meta-analyses of ap- 
proximately 50 studies from 10 separate laboratories that appeared to 
provide replicable evidence for psi using an experimental protocol 
known as the ganzfeld procedure. 

In most studies using the ganzfeld procedure, two participants — a 
“sender” and a “receiver” — are sequestered in separate, acoustically iso- 
lated rooms. For approximately 30 min, the sender concentrates on a 
randomly selected stimulus target, for example, an art print, a photo- 
graph, or a brief videotaped sequence. During the same period, the re- 
ceiver is immersed in a mild form of perceptual isolation called the 
ganzfeld (total field) while providing a continuous verbal report of his or 
her ongoing thoughts, feelings, and images. At the completion of the 


An earlier version of this article was presented at the 43rd Annual Convention of the 
Parapsychological Association (Palmer & Broughton, 2000). We wish to thank Richard 
Eibach, Nicholas Epley, and Thomas Keegan from the Department of Psychology, Cornell 
University, for serving as raters, and Paul Blue, administrative assistant at the Rhine Re- 
search Center, for coding and randomizing the rating materials. 
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ganzfeld period, the receiver is shown several stimuli (usually four) and, 
without knowing which stimulus was the target, is asked to rate the degree 
to which each matches the thoughts, feelings, and images experienced 
during the ganzfeld period. If the receiver assigns the highest rating to 
the target stimulus, it is scored as a hit. Thus, if the experiment uses judg- 
ing sets containing four stimuli (the target and three decoys or control 
stimuli), the hit rate expected by chance is 25%. 1 * 

In their article, Bern and Honorton (1994) reported a hit rate of 35% 
( p< 10” 9 ) for 28 ganzfeld studies conducted between 1974 and 1981 and a 
hit rate of 32% (p = .0008) for 10 computer-controlled (“autoganzfeld”) 
studies conducted between 1983 and 1989 that had been specifically de- 
signed to eliminate methodological flaws identified in some of the earlier 
studies. 

More recently, Milton and Wiseman (1999) published a follow-up 
meta-analysis of 30 additional ganzfeld studies that had been conducted 
from 1987 through 1997. They concluded that these studies did not yield 
an overall significant effect, thereby calling into question the replicability 
of the ganzfeld procedure (see Storm 8c Ertel, 2001, for a critique of that 
meta-analysis) . Milton subsequently organized and initiated an Internet 
debate of the ganzfeld research, a debate that was edited for publication 
by Schmeidler and Edge (1999) . In her own contribution to that debate, 
Milton (1999) noted that when replications published after the Mil- 
ton-Wiseman cutoff date are added to the database, the accumulated 
studies do, in fact, achieve statistical significance. Even so, however, the 
mean effect size of these more recent studies is still significandy smaller 
than those reported by Bern and Honorton for the two earlier databases. 

The z scores of the studies in the Milton-Wiseman database are signifi- 
cantly heterogeneous, and one of the observations made during the online 
debate was that several studies contributing negative z scores to the analysis 
had used procedures that deviated markedly from the standard ganzfeld 
protocol. Such a development is neither bad nor unexpected. Many psi re- 
searchers believe that the reliability of the basic procedure is sufficiendy 
well established to warrant using it as a tool for the further exploration of 
psi. Thus, rather than continuing to conduct exact replications, they have 
been modifying the procedure and extending it into unknown territory. 
Not unexpectedly, such deviations from exact replication are at increased 
risk for failure. For example, rather than using visual stimuli, Willin 
(1996a, 1996b) modified the ganzfeld procedure to test whether senders 
could communicate musical targets to receivers. They could not. When 
such studies are thrown into an undifferentiated meta-analysis, the overall 


1 Some studies using the standard ganzfeld procedure eliminate the sender to test for 

a psi process that does not involve anomalous communication between two people. 
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effect size is thereby reduced and, perversely, the ganzfeld procedure be- 
comes a victim of its own success. 

In the present study, we sought to test this explanation for the appar- 
ent decline in ganzfeld effect sizes. Three independent raters unfamiliar 
with the recent ganzfeld studies and uninformed as to the studies’ out- 
comes rated the degree to which each of the recent studies deviated from 
the standard ganzfeld protocol. The database was then reexamined to test 
the hypothesis that effect sizes are positively correlated with the degree to 
which the experimental procedures adhere to the standard protocol. 


Method 


Studies Included in the Analysis 

In addition to the 30 studies analyzed by Milton and Wiseman (1999), 
an additional 10 studies were located by examining the six major publica- 
tion oudets for parapsychological research. Many of these studies had been 
completed but not yet published prior to the cutoff date set by Milton and 
Wiseman for their meta-analysis. Following Milton and Wiseman, we treated 
separate experimental series within a given report separately but not experi- 
mental conditions within a given series. Two studies in the Milton-Wiseman 
sample that were originally reported in the Parapsychological Association’s 
Proceedings of Presented Papers were replaced by their published reports in ar- 
chival journals. These substitutions did not affect the statistical outcomes re- 
ported by Milton and Wiseman for these studies. Table 1 lists all 40 studies, 
with the 10 new studies identified by asterisks. 

Raters 

Three advanced graduate students in psychology at Cornell Univer- 
sity were recruited by the first author to serve as raters. All have had con- 
siderable experience designing and conducting laboratory experiments 
in social psychology. Their prior familiarity with the ganzfeld procedure 
was limited to having read Bern and Honorton’s (1994) article or having 
heard Bern present the information from that same article in a collo- 
quium or lecture. They were not acquainted with any of the 40 subse- 
quent studies they were asked to rate. 

Rating Materials 

The method sections for the 40 studies to be rated were first edited to 
eliminate all article titles, authors, hypotheses, references to results of 
other experiments in the sample, and descriptions of psychological tests 
(except those given during the ganzfeld or used for participant selec- 
tion). The edited method sections were then photocopied and assem- 
bled into judging packets. 
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Table 1 

Number of Trials, z Score, Effect Size (ES), Hit Rate, and 
Standardness Rating for Each Study in the Updated Ganzfeld 
Database (Arranged in Order of Decreasing Standardness) 


Study 

Trial 

z 

ES 

Hit 

Stari- 


s 

score 


rate 

dard- 





% 

ness 

Bierman et al. (199S) - Series I 

50 

0.03 

0.00 

26.0 

7.00 

Bierman et al. (1993) - Series II 

50 

-0.30 

-0.04 

24.0 

7.00 

Broughton 8c Alexander (1997) - 

50 

-0.30 

-0.04 

24.0 

7.00 

First Timers Series T 
Broughton & Alexander (1997) - 

50 

-1.33 

-0.19 

8.0 

7.00 

First Timers Series 2 a 
Broughton & Alexander (1997) - 

51 

1.81 

0.25 

37.3 

7.00 

Emotionally Close Series” 
Dalton (1994) 

29 

1.76 

0.33 

41.4 

7.00 

♦Dalton (1997) 

12 






8 

5.20 

0.46 

46.9 

7.00 

Morris et al. (1993) - 






Cunningham Study 

32 

1.78 

0.31 

40.6 

7.00 

♦Alexander & Broughton (1999) 

50 

1.60 

0.23 

36.0 

6.67 

Broughton & Alexander (1997) a - 

50 

-0.64 

-0.09 

22.0 

6.67 

Clairvoyance Series 
Broughton & Alexander (1997) a - 

8 

0.46 

0.16 

37.5 

6.67 

General Series 

Kanthamani 8c Broughton (1994) - 

40 

-0.91 

-0.14 

20.0 

6.67 

Series 3 

Kanthamani 8c Broughton (1994) - 

65 

2.01 

0.25 

36.9 

6.67 

Series 4 

Parker et al. (1997) - Study 2 b 

30 

1.25 

0.23 

36.7 

6.67 

Parker et al. (1997) - Study 3 b 

30 

1.25 

0.23 

36.7 

6.67 

♦Parker 8c Westerlund (1998) - 

30 

2.40 

0.44 

46.7 

6.67 

Study 4 

♦Parker 8c Westerlund (1998) - 

30 

1.25 

0.23 

36.7 

6.67 

Study 5 

Kanthamani 8c Palmer (1993) 

22 

-2.17 

-0.46 

9.1 

6.33 

Morris et al. (1995) 

97 

1.67 

0.17 

33.0 

6.33 

Kanthamani & Broughton (1994) - 

50 

0.03 

0.00 

26.0 

6.00 

Series 8 

Morris et al. (1993) - 

32 

-0.17 

-0.03 

25.0 

6.00 

McAlpine Study 
Stanford 8c Frank (1991) 

58 

-1.24 

-0.16 

19.0 d 

5.67 

Kanthamani 8c Broughton (1994) - 

46 

0.03 

0.00 

26.1 

5.33 

Series 7 

McDonough et al. (1994) 

20 

1.02 

0.23 

30.0 

5.33 

Parker et al. (1997) - Study l b 

30 

-0.83 

-0.15 

20.0 

5.33 

Williams et al. (1994) 

42 

-2.30 

-0.35 

11.9 

5.33 
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*Wezelman et al. (1997) 

32 

2.15 

0.38 

43.8 

4.67 

Bierman (1995) - Series III 

40 

1.94 

0.31 

40.0 

4.33 

Bierman (1995) - Series IV 

36 

1.33 

0.22 

36.1 

4.33 

♦Symmons & Morris (1997) 

51 

2.97 

0.42 

45.1 

4.00 

♦Wezelman & Bierman (1997) - 

32 

-1.45 

-0.26 

15.6 

4.00 

Series IV 

Kanthamani 8c Khilji (1990) - 

40 

0.52 

0.08 

30.0“ 

.67 

Series 6b c 

Kanthamani & Broughton (1992) - 

20 

-0.46 

-0.10 

25.0“ 

3.33 

Series 6a c 

♦Parker & Westerlund (1998) - 

30 

-0.49 

-0.09 

23.0“ 

3.33 

Serial Study 

♦Wezelman 8c Bierman (1997) - 

40 

-0.91 

-0.14 

20.0 

3.00 

Series V 

♦Wezelman & Bierman (1997) - 

40 

-0.15 

-0.02 

25.0 

3.00 

Series VI 

Kanthamani et al. (1988) - 

4 

0.22 

0.11 

50.0 

2.67 

Series 5a c 

Kanthamani et al. (1988) - 

10 

-2.06 

-0.65 

10.0“ 

2.67 

Series 5b° 

10 





Willin (1996a) 

0 

-0.33 

-0.03 

24.0 

1.33 

Willin (1996b) 

16 

_ -Q M. 

-0.06 

25.0 

1.33 


Note: Asterisks denote studies added to Milton and Wiseman (1999). 

“Cited as Broughton and Alexander (1996) in Milton and Wiseman (1999). 
b Cited as Johansson and Parker (1995) in Milton and Wiseman (1999). 
c Series summarized and numbered in Kanthamani and Broughton (1994). 
d Hit rate not reported. Estimated from z score. 

Because there were four instances in which the methods were identi- 
cal for two separate series, there were only 36 separate method sections for 
the 40 studies. Also, because some method sections referred back to the 
method sections of previous series in the same article, some series were 
bundled together, creating 20 separate packets containing the 36 method 
sections. An assistant not otherwise involved in the study assigned code 
numbers to each method section and then randomly ordered the se- 
quence of 20 packets differently for each rater. The coding procedure en- 
abled us to examine the reliability and distribution of ratings while remain- 
ing blind to which ratings were assigned to which studies. 

A rating sheet was stapled to the front of each method section. It con- 
sisted of a 7-point scale with 1 = standard and 7 = nonstandard. For purposes 
of exposition, we subtracted each rating from 8 so that higher ratings 
would correspond to greater adherence to the standard ganzfeld protocol. 
Blank spaces underneath the scale permitted the raters to specify the 
methodological deviations that influenced their ratings. 

Rating Instructions 

The Internet debate implied that parapsychologists actively involved 
in ganzfeld research would be unlikely to agree on a single definition of 
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the standard ganzfeld procedure. Rather than provide our own ad hoc 
definition, we had the raters read the general description from the sec- 
tion labeled “The Ganzfeld Procedure” in Bern and Honorton’s (1994, 
pp. 5-6) report as well as most of the detailed method section describing 
the computer-controlled autoganzfeld procedure used in Honorton’s 
Psychophysical Research Laboratories (PRL) published in the Journal of 
Parapsychology (Honorton et al., 1990, pp. 102-110). They were further 
instructed that the Bem-Honorton description 

specifies the main ingredients of the standard ganzfeld method, 
and these elements must be included in any ganzfeld procedure 
if it is to be considered purely standard. You will note that for a 
few procedural elements the section says that they are used “most 
often,” “typically,” or something to that effect. In these instances, 
the opposite procedure can still be considered standard. For ex- 
ample, the page states that “most often” the procedure includes 
a sender (telepathy). However, the minority of studies that did 
not use a sender (clairvoyance) can still be considered standard. 
Deviant elements can either be substitutes for standard elements 
or additions to them. 

With regard to the PRL autoganzfeld procedure, the raters were told 
that the experiments 

need not conform to all the details of this protocol to be consid- 
ered standard, but procedures cited in this section should not be 
considered non-standard if they are incorporated in the studies 
you will be rating. (Note: One feature of the PRL experiment not 
mentioned in its methodological description is that the experi- 
menter, while still blind to the target, sometimes helped the sub- 
ject do the judging.) 

You should take note of authors’ declarations that their proce- 
dures were standard or non-standard, but you are not bound by 
such declarations. 

You should treat as standard the use of artistic or “ creative ” subject 
samples (since one of the most successful components of the PRL 
experiment used such a sample) or subjects having had previous 
psi experiences or having practiced a mental discipline such as 
meditation (since such subjects were shown to be the best scorers 
in the PRL experiment) . 

There are a few kinds of deviations you should not count at all. 

Do not pay attention to psychological tests that might have been 
given to the subjects, unless they are given while the subjects are 
actually in the ganzfeld or influence the selection of subjects. 
Even in these cases it is up to you to decide how much, if any, 
such factors make the method non-standard. Also, do not con- 
sider sample size or the method of statistical analysis. Finally, do not 
count deviations the only effect of which is to influence the 
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likelihood of artifacts, such as sensory leakage of the target infor- 
mation. Such deviations are important in the broader scheme of 
things, but not for this exercise. 

You should base your judgment'of standardness not only on the 
number of deviant elements but also on their importance. Judg- 
ments of importance should reflect how likely you think it is 
that the deviant element might have influenced the results, 
based on common sense and your understanding of how such 
judgments are made for other kinds of psychology experi- 
ments. In so doing, you should pay attention to the rationale or 
theory parapsychologists have developed to explain why the 
ganzfeld should facilitate high ESP scores (although lack of 
such relevance does not preclude a deviant element from being 
important). You will find that the Psychological Bulletin article 
discusses this rationale. 2 

Raters were not permitted to consult with one another while making 
their ratings although they were permitted to seek clarification of the in- 
structions from the first author. None did, however. 


Results and Discussion 


Basic Update 

Table 1 presents the z scores and effect sizes for all 40 studies in the 
sample. Milton and Wiseman’s (1999) own figures were used for the 30 
studies in their analysis, and their statistical procedures were duplicated 
to the extent possible for the 10 new studies. In cases in which the num- 
ber of direct hits was reported, an exact binomial probability was com- 
puted and converted to a one-tailed z score. In three studies (Symmons 
8c Morris, 1997; Wezelman 8c Bierman, 1997, Series V and VI) , hits were 
reported for both receiver judges and outside judges. In these cases, z 
scores were computed for both counts and averaged. This was the pro- 
cedure Milton and Wiseman (1999) apparently used in the most compa- 
rable case from their survey (McDonough, Don, 8c Warren, 1994). In 
the Serial Series of Parker and Westerlund (1998), the total number of 
hits for the 30 participants, averaged over the four trials per session, was 
calculated to be 6.75, and the binomial probability of this value was ob- 
tained using a .75 interpolation between 6 and 7. Effect sizes were calcu- 
lated using the formula employed by Milton and Wiseman (1999), 
z/ N 1/2 (hereinafter labeled ES ) . 

The 10 new ganzfeld replication studies yield an overall hit rate of 
36.7%, ES = .17, Stouffer Z = 3.97, p = 3.5 x 10 " 5 , one-tailed. All 40 


Complete instructions to the raters can be obtained from the authors. 
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replication studies combined yield an overall hit rate of 30.1 %,ES= .051, 
Stouffer Z = 2.59, p= .0048, one-tailed. This latter set of figures thus repre- 
sents the current status of ganzfeld studies published after those summa- 
rized in Bern and Honorton (1994) . By this measure, then, the ganzfeld ef- 
fect remains replicable, but the mean effect size for these 40 studies falls 
below the 95% confidence intervals for both the 39 preautoganzfeld stud- 
ies (.080 to .328) and the 10 previous autoganzfeld studies (.059 to .269). 3 
Accordingly, we now turn to our hypothesis that the effect sizes of the 
ganzfeld replications are moderated by the degree to which their experi- 
mental procedures adhere to the standard ganzfeld protocol. 

Standard Versus Nonstandard Replications 

The “standardness” ratings of the three raters achieved a Cronbach’s 
alpha of .78. The mean of the three sets of ratings on the 7-point scale was 
5.33, where higher ratings correspond to greater adherence to the stan- 
dard ganzfeld protocol. As hypothesized, the degree to which a replica- 
tion adheres to the standard ganzfeld protocol is positively and signifi- 
candy correlated with ES, r s ( 38) = .31, p= .024, one-tailed. 

This same outcome can be observed by defining as standard the 29 
replications whose ratings fell above the midpoint of the scale (4) and de- 
fining as nonstandard the 9 replications that fell below the midpoint (2 
replications fell at the midpoint) : The standard replications obtain an 
overall hit rate of 31.2%, ES= .096, Stouffer Z= 3.49, p- .0002, one-tailed. 
In contrast, the nonstandard replications obtain an overall hit rate of 
only 24.0%, ES = -.10, Stouffer Z= -1.30, ns. The difference between the 
standard and nonstandard replications is itself significant, U = 190.5, p = 
.020, one-tailed. Most importantly, the mean effect size of the standard 
replications falls within the 95% confidence intervals of both the 39 
preautoganzfeld studies and the 10 autoganzfeld studies summarized by 
Bern and Honorton (1994). In other words, ganzfeld studies that adhere 
to the standard ganzfeld protocol continue to replicate with effect sizes 
comparable with those obtained in previous studies. 

It is true, of course, that the preautoganzfeld studies were themselves 
methodologically diverse and may have included some studies that would 
have been rated as nonstandard by our raters. If such studies were to be 
excluded from the preautoganzfeld database, it is conceivable that the 
new replications would not fall inside the preautoganzfeld confidence 


3 For purposes of effect-size comparisons, we have included in the preautoganzfeld 
database 11 additional studies that Honorton (1985) had set aside because the investiga- 
tors had not reported direct hit rates. This brings the total number of studies in the 
preautoganzfeld database to 39 (mean ES = .20). Details of how we calculated the effect 
sizes for these additional studies can be obtained from the authors. 
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limits. This possibility can only be assessed by a separate standardness 
analysis of the preautoganzfeld database. 

As noted earlier, our raters were instructed that “for a few procedural 
elements the [method] section says that they are used ‘most often,’ ‘typi- 
cally,’ or something to that effect. In these instances, the opposite proce- 
dure can still be considered standard.” By implication, this would also in- 
clude procedural variations that the previous meta-analyses had 
suggested were psi-conducive, such as the use of dynamic rather than 
static targets or the pairing of friends to serve as sender and receiver. 
(Both of these experimental variables were mentioned in the method 
sections read by our raters.) Thus, a replication study that used only dy- 
namic targets to enhance the probability of successful replication would 
still be considered standard under these instructions. 

Analogously, we instructed our raters to treat as standard the 
preselection of participants who were artistic or creative, who reported 
previous psi experiences, or who practiced a mental discipline such as 
meditation. Even though these participant variables were not discussed 
in the particular methodological excerpts read by our raters, they were 
explicitly identified elsewhere in Bern and Honorton (1994, p. 13) as po- 
tentially psi-conducive on the basis of the previous meta-analyses. And, in 
fact, several of the 40 replications listed in Table 1 preselected their par- 
ticipants on some or all of these criteria specifically to enhance the proba- 
bility of successful replication. Accordingly, it was our judgment that it 
would be nonsensical to have our raters treat the use of such preselection 
criteria as a departure from the standard procedure. 

Perhaps there is some merit in continuing to conduct exact replica- 
tions of the ganzfeld procedure, but genuine progress in understanding 
psi rests on investigators’ being willing to risk replication failures by mod- 
ifying the procedure in any way that seems best suited for exploring new 
domains or answering new questions. (Milton, 1999, suggested the possi- 
bility of having researchers state in advance of conducting a study — and 
therefore not knowing the results — whether they wished the study to be 
part of a future proof-oriented meta-analysis.) In any case, future 
meta-analyses should distinguish “standard” replications from nonstan- 
dard extensions of the ganzfeld procedure lest it become a victim of its 
own success. 
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